91 research outputs found

    The topology of the bacterial co-conserved protein network and its implications for predicting protein function

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in <it>E. coli </it>K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins.</p> <p>Results</p> <p>Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.</p> <p>Conclusion</p> <p>Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.</p

    Spatial and Temporal Analysis of Gene Expression during Growth and Fusion of the Mouse Facial Prominences

    Get PDF
    Orofacial malformations resulting from genetic and/or environmental causes are frequent human birth defects yet their etiology is often unclear because of insufficient information concerning the molecular, cellular and morphogenetic processes responsible for normal facial development. We have, therefore, derived a comprehensive expression dataset for mouse orofacial development, interrogating three distinct regions – the mandibular, maxillary and frontonasal prominences. To capture the dynamic changes in the transcriptome during face formation, we sampled five time points between E10.5–E12.5, spanning the developmental period from establishment of the prominences to their fusion to form the mature facial platform. Seven independent biological replicates were used for each sample ensuring robustness and quality of the dataset. Here, we provide a general overview of the dataset, characterizing aspects of gene expression changes at both the spatial and temporal level. Considerable coordinate regulation occurs across the three prominences during this period of facial growth and morphogenesis, with a switch from expression of genes involved in cell proliferation to those associated with differentiation. An accompanying shift in the expression of polycomb and trithorax genes presumably maintains appropriate patterns of gene expression in precursor or differentiated cells, respectively. Superimposed on the many coordinated changes are prominence-specific differences in the expression of genes encoding transcription factors, extracellular matrix components, and signaling molecules. Thus, the elaboration of each prominence will be driven by particular combinations of transcription factors coupled with specific cell:cell and cell:matrix interactions. The dataset also reveals several prominence-specific genes not previously associated with orofacial development, a subset of which we externally validate. Several of these latter genes are components of bidirectional transcription units that likely share cis-acting sequences with well-characterized genes. Overall, our studies provide a valuable resource for probing orofacial development and a robust dataset for bioinformatic analysis of spatial and temporal gene expression changes during embryogenesis

    Combined deletion of Xrcc4 and Trp53 in mouse germinal center B cells leads to novel B cell lymphomas with clonal heterogeneity

    Get PDF
    Abstract Background Activated B lymphocytes harbor programmed DNA double-strand breaks (DSBs) initiated by activation-induced deaminase (AID) and repaired by non-homologous end-joining (NHEJ). While it has been proposed that these DSBs during secondary antibody gene diversification are the primary source of chromosomal translocations in germinal center (GC)-derived B cell lymphomas, this point has not been directly addressed due to the lack of proper mouse models. Methods In the current study, we establish a unique mouse model by specifically deleting a NHEJ gene, Xrcc4, and a cell cycle checkpoint gene, Trp53, in GC B cells, which results in the spontaneous development of B cell lymphomas that possess features of GC B cells. Results We show that these NHEJ deficient lymphomas harbor translocations frequently targeting immunoglobulin (Ig) loci. Furthermore, we found that Ig translocations were associated with distinct mechanisms, probably caused by AID- or RAG-induced DSBs. Intriguingly, the AID-associated Ig loci translocations target either c-myc or Pvt-1 locus whereas the partners of RAG-associated Ig translocations scattered randomly in the genome. Lastly, these NHEJ deficient lymphomas harbor complicated genomes including segmental translocations and exhibit a high level of ongoing DNA damage and clonal heterogeneity. Conclusions We propose that combined NHEJ and p53 defects may serve as an underlying mechanism for a high level of genomic complexity and clonal heterogeneity in cancers

    Reactive oxygen-related diseases: therapeutic targets and emerging clinical indications

    Get PDF
    SIGNIFICANCE Enhanced levels of reactive oxygen species (ROS) have been associated with different disease states. Most attempts to validate and exploit these associations by chronic antioxidant therapies have provided disappointing results. Hence, the clinical relevance of ROS is still largely unclear. RECENT ADVANCES We are now beginning to understand the reasons for these failures, which reside in the many important physiological roles of ROS in cell signaling. To exploit ROS therapeutically, it would be essential to define and treat the disease-relevant ROS at the right moment and leave physiological ROS formation intact. This breakthrough seems now within reach. CRITICAL ISSUES Rather than antioxidants, a new generation of protein targets for classical pharmacological agents includes ROS-forming or toxifying enzymes or proteins that are oxidatively damaged and can be functionally repaired. FUTURE DIRECTIONS Linking these target proteins in future to specific disease states and providing in each case proof of principle will be essential for translating the oxidative stress concept into the clinic. Antioxid. Redox Signal. 23, 1171-1185

    Improving protein function prediction methods with integrated literature data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity.</p> <p>Results</p> <p>We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial.</p> <p>Conclusion</p> <p>Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

    Predicting protein linkages in bacteria: Which method is best depends on task

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations.</p> <p>Results</p> <p>Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction.</p> <p>Conclusion</p> <p>A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p

    Sputum is a surrogate for bronchoalveolar lavage for monitoring Mycobacterium tuberculosis transcriptional profiles in TB patients

    Get PDF
    SummaryPathogen-targeted transcriptional profiling in human sputum may elucidate the physiologic state of Mycobacterium tuberculosis (M. tuberculosis) during infection and treatment. However, whether M. tuberculosis transcription in sputum recapitulates transcription in the lung is uncertain. We therefore compared M. tuberculosis transcription in human sputum and bronchoalveolar lavage (BAL) samples from 11 HIV-negative South African patients with pulmonary tuberculosis. We additionally compared these clinical samples with inΒ vitro log phase aerobic growth and hypoxic non-replicating persistence (NRP-2). Of 2179Β M. tuberculosis transcripts assayed in sputum and BAL via multiplex RT-PCR, 194 (8.9%) had a p-value <0.05, but none were significant after correction for multiple testing. Categorical enrichment analysis indicated that expression of the hypoxia-responsive DosR regulon was higher in BAL than in sputum. M. tuberculosis transcription in BAL and sputum was distinct from both aerobic growth and NRP-2, with a range of 396–1020 transcripts significantly differentially expressed after multiple testing correction. Collectively, our results indicate that M. tuberculosis transcription in sputum approximates M. tuberculosis transcription in the lung. Minor differences between M. tuberculosis transcription in BAL and sputum suggested lower oxygen concentrations or higher nitric oxide concentrations in BAL. M. tuberculosis-targeted transcriptional profiling of sputa may be a powerful tool for understanding M. tuberculosis pathogenesis and monitoring treatment responses inΒ vivo

    Use of intervention mapping to adapt a health behavior change intervention for endometrial cancer survivors: The shape-up following cancer treatment program

    Get PDF
    Background: About 80% of endometrial cancer survivors (ECS) are overweight or obese and have sedentary behaviors. Lifestyle behavior interventions are promising for improving dietary and physical activity behaviors, but the constructs associated with their effectiveness are often inadequately reported. The aim of this study was to systematically adapt an evidence-based behavior change program to improve healthy lifestyle behaviors in ECS. Methods: Following a review of the literature, focus groups and interviews were conducted with ECS (n = 16). An intervention mapping protocol was used for the program adaptation, which consisted of six steps: a needs assessment, formulation of matrices of change objectives, selection of theoretical methods and practical applications, program production, adoption and implementation planning, and evaluation planning. Social Cognitive Theory and Control Theory guided the adaptation of the intervention. Results: The process consisted of eight 90-min group sessions focusing on shaping outcome expectations, knowledge, self-efficacy, and goals about healthy eating and physical activity. The adapted performance objectives included establishment of regular eating, balanced diet, and portion sizes, reduction in sedentary behaviors, increase in lifestyle and organized activities, formulation of a discrepancy-reducing feedback loop for all above behaviors, and trigger management. Information on managing fatigue and bowel issues unique to ECS were added. Conclusions: Systematic intervention mapping provided a framework to design a cancer survivor-centered lifestyle intervention. ECS welcomed the intervention and provided essential feedback for its adaptation. The program has been evaluated through a randomized controlled trial

    Biomedical Discovery Acceleration, with Applications to Craniofacial Development

    Get PDF
    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

    The Peripheral Blood Transcriptome Identifies the Presence and Extent of Disease in Idiopathic Pulmonary Fibrosis

    Get PDF
    <div><h3>Rationale</h3><p>Peripheral blood biomarkers are needed to identify and determine the extent of idiopathic pulmonary fibrosis (IPF). Current physiologic and radiographic prognostic indicators diagnose IPF too late in the course of disease. We hypothesize that peripheral blood biomarkers will identify disease in its early stages, and facilitate monitoring for disease progression.</p> <h3>Methods</h3><p>Gene expression profiles of peripheral blood RNA from 130 IPF patients were collected on Agilent microarrays. Significance analysis of microarrays (SAM) with a false discovery rate (FDR) of 1% was utilized to identify genes that were differentially-expressed in samples categorized based on percent predicted D<sub>L</sub>CO and FVC.</p> <h3>Main Measurements and Results</h3><p>At 1% FDR, 1428 genes were differentially-expressed in mild IPF (D<sub>L</sub>CO >65%) compared to controls and 2790 transcripts were differentially- expressed in severe IPF (D<sub>L</sub>CO >35%) compared to controls. When categorized by percent predicted D<sub>L</sub>CO, SAM demonstrated 13 differentially-expressed transcripts between mild and severe IPF (< 5% FDR). These include CAMP, CEACAM6, CTSG, DEFA3 and A4, OLFM4, HLTF, PACSIN1, GABBR1, IGHM, and 3 unknown genes. Principal component analysis (PCA) was performed to determine outliers based on severity of disease, and demonstrated 1 mild case to be clinically misclassified as a severe case of IPF. No differentially-expressed transcripts were identified between mild and severe IPF when categorized by percent predicted FVC.</p> <h3>Conclusions</h3><p>These results demonstrate that the peripheral blood transcriptome has the potential to distinguish normal individuals from patients with IPF, as well as extent of disease when samples were classified by percent predicted D<sub>L</sub>CO, but not FVC.</p> </div
    • …
    corecore